An ensemble approach to the analysis of weighted networks 
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We present a new approach to the analysis of weighted networlcs, by providing a straightforward 
generalization of any network measure defined on unweighted networks, such as the average degree of 
the nearest neighbours, the clustering coefficient, the 'betweenness', the distance between two nodes 
and the diameter of a network. All these measures are well established for unweighted networks 
but have hitherto proven difficult to define for weighted networks. Our approach is based on the 
translation of a weighted network into an ensemble of edges. Further to introducing this approach 
we demonstrate its advantages by applying the clustering coefficient constructed in this way to two 
real- world weighted networks. 



Weighted complex networks appear in many different 
contexts, for example when studying transport and traf- 
fic [l|, I4I1 in the form of trade or communication net- 
works, financial networks [3|, and collaboration networks 
[j| , to name a few. In addition, high-throughput technol- 
ogy has generated large amounts of biological data which 
can be interpreted in terms of weighted networks, such as 
networks of genetic regulation and transcription [5| and 
protein interaction |6i]. While such networks can now be 
generated relatively easy, the extraction of meaningful 
physical or biological information from these networks 
is a much more challenging task. For unweighted com- 
plex networks - in which the entries of the adjacency 
matrix are restricted to zero and one - a set of local 
and global measures on the network has been defined 
[T], including the degree of a node, its average nearest- 
neighbour degree la| and its clustering coefficient [9| . Fur- 
ther measures include the distance between two nodes, 
the related diameter of the network and the betweenness 
\U\ of an edge or node. While the definition of such mea- 
sures for unweighted networks is relatively straightfor- 
ward, defining these measures for weighted networks is 
more difficult and has been the subject of recent research 

Here we introduce a new approach to this problem 
which allows for a straightforward generalization of any 
measure defined on an unweighted network to weighted 
networks. In addition we explicitly construct weighted 
versions of the clustering coefficient, the average degree 
of neighbours, the distance between two nodes and the 
diameter of the network. We compare this newly con- 
structed clustering coefficient to a weighted clustering 
coefficient in the literature and to a version used in un- 
weighted networks. The data sets we use for this compar- 
ison are aviation passenger data within the EU, which 
constitutes an almost fully connected network, and the 
network formed by neighbouring letters in the English 
language. 

Ensemble networks — The basis of our approach is to 
find a continuous bijective map M : R — > [0, 1] from the 
real numbers to the interval between and 1, which maps 
the weights Wij S M to a quantity pij G [0, 1]. A simple 



example of such a map is a linear normalization of the 
weights: 



P^j 



Wjj - min(w;.,-) 
TiVAyi{wij) — min(w.y) 



(1) 



This simple normalization maps minwij to zero. This is 
often acceptable in the case of a distance matrix, but 
if there are many edges with weight vamwij , one should 
introduce a parameter e <^ 1, such that: 



P^j 



— min(wij) + e 



max(wy) — min(wij) -|- e 



(2) 



Many other more sophisticated maps are imaginable and 
the final choice of map depends on the properties of the 
physical system underlying the network and the result- 
ing distribution of weights. Appropriately chosen maps 
can deal with all variants of weighted networks including 
those with negative weights, and with differing interpre- 
tations of Wij = as meaning 'no edge' or as a physical 
weight. We will return to the topic of map choice below. 
The ideas we introduce in this paper are based on an 
interpretation of the matrix P with entries {pij} as a 
matrix of probabilities. These probabilities can be inter- 
preted as an ensemble of edges, or more concisely, an en- 
semble network. Thus, just as any binary square matrix 
can be understood as an unweighted network and any 
real square matrix corresponds to a weighted network, 
any square matrix with entries between and 1 corre- 
sponds to an ensemble network. If we sample each edge 
of the ensemble network exactly once, we obtain an un- 
weighted network which we term a realization of the en- 
semble network. In particular, pij is the probability that 
the edge between nodes i and j exists. These concepts are 
valid both for directed networks, with any pij e [0,1], 
and undirected networks, for which pij = pji, so that 
the matrix is symmetric. Note that, while some specific 
weighted networks discussed in the literature have prob- 
abilities as their weights [a [ij] , a general framework for 
the analysis of weighted networks, based on the transfor- 
mation of weights to probabilities, has to our knowledge 
not been proposed. In a real-world weighted network, the 



original weights can represent almost any physical quan- 
tity, such as the strength of a collaboration between two 
scientists, or the number of passengers traveling between 
two countries. This is why we use a map M to trans- 
late the original weights into probabilities. Doing so does 
not destroy any of the topological information contained 
in the weights and connections, but allows us to ana- 
lyze this information in the unifying framework which the 
probabilities provide. Furthermore, in many cases of real- 
world weighted networks, the transformation of weights 
to probabilities has a physical meaning. Examples include 
flow networks of traffic and transport, communications 
networks as well as collaboration networks. In all these, 
the interactions between nodes involve the transfer of a 
discrete unit (e.g. passengers, currency or data packets) 
over a given period of time. Thus the weight, represent- 
ing the number of units transferred, is directly related to 
the probability of observing the transfer of a unit at a 
given point in time. 

In the framework of ensemble networks any existing 
measure on unweighted networks can be turned into an 
equivalent measure on weighted networks. A suitable 
choice of map M depends on the distribution of weights. 
For example, in both real-world networks which we an- 
alyze in this paper the original weights Wij take val- 
ues across several orders of magnitude, so that we chose 
the pij to be the normalized logarithms of the original 
weights, rather than the normalized weights themselves. 

Polynomials of adjacency matrix entries — All mea- 
sures on unweighted networks can be written as functions 
of the entries aij of an adjacency matrix A. In fact, gener- 
ally they can be written as a polynomial of these entries, 
or a simple ratio of such polynomials. Note that, for an 
unweighted network, Oy = a"^ for all positive integers 
TO > 0, so that these polynomials are of first order only. 
Consider a general first-order polynomial, which can be 
written fully expanded as: 



N 



/(A)=^c, n 



Mq)jk 



g=0 J,fe=0 



where N is the number of nodes, the Cq are real co- 
efficients and the h{q)jk are a set of boolean matri- 
ces specifying which adjacency matrix entries appear in 
each term of the polynomial. The probability Pq that 



N 



j,k=0 "-jk 



b{<})3k 



= 1 in a given realization A is simply 



n 

Pq = Ufk^oPfk^'" ■ Thus, due to the finearity of the 
polynomial, the average /(P) of / over the ensemble net- 
work realizations is: 



2" N 

q=0 j,k=0 



/(P) 



(3) 



to the value of the polynomial of the ensemble network 
adjacency matrix itself. We will illustrate the power of 
this result in the following sections. 

Constructing the measures — Our approach allows for 
the construction of weighted network measures from 
their unweighted counterparts. As almost all existing un- 
weighted measures arc for undirected networks, the mea- 
sures we construct in the remainder of this Letter are 
also undirected. In general however our method is equally 
well suited to the transformation of any measure for di- 
rected, unweighted networks into one for directed and 
weighted networks. The degree ki of a given node i in 
an unweighted network with adjacency matrix elements 
Oij is the number of its neighbours, and is written as 
ki — ^^ciij. In a weighted network with elements Wij 
the corresponding quantity has been termed the strength 
of the node i, denoted as Si, which consists of the sum 
of the weights: Si = ^^Wij. In an ensemble network, 
the corresponding sum over the edges attached to a par- 
ticular node gives the average degree of node i across 
realizations, denoted as ki and given by ki = '^jPij- 

It is important to note that while the strength of a node 
in a weighted network may have meaning in the context 
of the network, ki has a universal meaning, regardless of 
the original meaning of the weights. Now consider the 
total number of edges n in a network - also referred to as 
its size ~ given hy n = ^ • a.y in the directed case and 
half this value in the undirected case where Oij — Oji. 
Replacing Oij by pij again gives us the average size h of 
the realizations of the ensemble network, which is simply 
n = ^ ■ Pij (or half this value for the undirected case). 

A more complex measure in unweighted networks is 
the average degree of the nearest neighbours fcf ", which 
is the number of neighbours of i's neighbours, divided by 
the number of neighbours of z [8J : 



E,fc 



J --J _ ^j,k (^ij'^jk 



f^i 2-^j ^ij 

where j ^ i in the sums. By rewriting fc"" solely in terms 
of the Qij , this generalizes to ensemble networks in a very 
straightforward manner: 

, nn,e _ l^j,kPvPjk 

This measure fc""'*^ is simply a ratio of averages: the av- 
erage number of neighbours of I's neighbours over the 
average number of i's neighbours. 

For unweighted networks the clustering coefficient of a 
node i has been defined Q as: 



/ , j f^ ^ij^jk^ik 2-^j.k ^ij^jk^ik 



k{k -l)/2 



Z-^j,k '^ij'^ik 



(4) 



This means that the value of a polynomial function / of 
the entries of an unweighted network A, averaged over 
the realizations of a given ensemble network P is equal 



where k^j^i^k in the sums. This corresponds 
to the number of triangles in the network which include 
node i, divided by the number of pairs of bonds including 



i, which represent potential triangles. Using the ensem- 
ble approach with its normalized weights this generalizes 
straightforwardly to: 



Ej.fc PijPjkPik 

^j,k PijPik 



(5) 



which can be read as the average number of triangles di- 
vided by the average number of bond pairs. In modified 
form, this clustering coefficient has appeared in the very 
recent literature [5| but without connection to a general 
approach to the construction of weighted network mea- 
sures based on a general mapping from weights to proba- 
bilities. Note that fc""''= and c? are not the averages of fc"" 
and Ci over the ensemble. We will address this subtlety 
below. 

As an example of the power of eq. ([3]), consider the 
distance dij (i.e. the shortest path) between two nodes 
i and j in an unweighted A'^-node network, represented 
entirely as a function of adjacency matrix entries: 
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d.,(A) =. a,, + (1 -a.,) }^(m+ l)a|7)(A)/3(™ 
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where a|"'(A)= nil/ [1 



/3,|f], with: 
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where all Yl without a range are equal to one. As dij 
is a first-order polynomial in a^ - the elements of the 
adjacency matrix A - we know immediately from eq. ([3|) 
that the average distance in the ensemble network will 
be dij(P) = dij(P). Thus we have defined a distance 
measure on weighted networks without having to define 
a pairwise distance function of the edge weights (such as, 
for example, dij — (wij)'^ [4]). 

Similarly, the diameter of an unweighted network, de- 
fined as the maximum distance D{A) = max dij (A) be- 
tween two nodes out of all pairs of nodes i, j can be writ- 
ten as a first-order polynomial: 



N 

E 

m — 1 



DiA)^l[ a,, + ^ (m + 1) C^") (A) C'™) (A) 



where (^"^(A) = iTq'Jii^ - C^'^^A)] and ^(")(A) ^ 
Yii j PiT (A) . This expression allows us to straightfor- 
wardly calculate the average diameter D{P) ~ DCP) of 
the ensemble network. 

Another measure, the betweenness [lO| of a node i or 
an edge («, j), is the number of different shortest paths in 
the network which run through i or {i, j)s in the network. 
Like measures such as the distance and diameter, the 
betweenness can also be generalized to the weighted case 
by simply replacing the aij by pij . As the expressions in 
terms of adjacency matrix entries are rather involved, we 
do not give them here explicitly. 
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FIG. 1: Analysis of the network of air travel passengers within 
the 25 member states of the EU. This network is almost fully 
connected. TOP: Unweighted clustering coefficient versus de- 
gree. All 25 data points are projected onto 7 locations, as a 
result of the information loss due to discarding the weights, 
and because the network is almost fully connected. MIDDLE: 
Clustering coefficient as proposed in the literature Q) versus 
strength. This "mixed" clustering coefficient is a function of 
unweighted and weighted quantities. No clear relationship is 
evident, again because the network is almost fully connected. 
BOTTOM: Ensemble clustering coefficient versus ensemble 
degree. Unlike the other two approaches, those derived using 
the ensemble quantities exhibit a clear negative linear rela- 
tionship. The lines are lines of best fit. Note that the absolute 
scale of the ensemble clustering coefficient cf depends on the 
choice of the map M from weights to probabilities, which 
makes the relative values of cf more important than the ab- 
solute ones. 
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FIG. 2: Analysis of the weighted network formed between 
the 26 letters of the alphabet and the space between words 
[la ]. TOP; Unweighted clustering coefficient versus degree. 
MIDDLE: Clustering coefficient as proposed in the literature 
[2| versus strength. This "mixed" clustering coefficient is a 
function of unweighted and weighted quantities. BOTTOM: 
Ensemble clustering coefficient versus ensemble degree. The 
ensemble approach makes use of all information contained in 
the weights, while the two others lose some of the information, 
as is shown by the plateau which both exhibit on the left side 
of the plots. The diagonal lines are lines of best fit for data 
points below the plateau. Note that the absolute scale of the 
ensemble clustering coefficient cf depends on the choice of 
the map M from weights to probabilities, which makes the 
relative values of cf more important than the absolute ones. 



Some measures on unweighted networks, such as the 
average neighbour degree fc„„ and the clustering coef- 
ficient Ci are ratios of two adjacency matrix polyno- 
mials / and g, which in general can be written as 
h{A) — f{A)/g{A). Now we can define the quantity 
h^iP) = f{P)/g{P) = h{P). But, as was pointed out 
above, this quantity is no longer an average of /i(A) it- 
self (which would be denoted h{P)). This gives us two 
distinct classes of measures: The first contains measures 
which can be written in polynomial form, and for which 
the ensemble version gives the average across realizations. 
These measures represent countable, integer quantities of 
the network, such as the number of neighbours, the num- 
ber of triangles, the length of the shortest path between 
i and j, and so on. The second class are measures which 
are ratios of polynomials, such as the average nearest- 
neighbour degree or the clustering coefficient. The en- 
semble network version of these measures gives the ratio 
of the averages. 

All measures constructed with the ensemble approach 
are only functions of the normalized weights pij, not of 
the elements of an unweighted adjacency matrix aij or of 
the degree k. This distinguishes the ensemble measures 
from measures proposed for weighted networks in the lit- 
erature, such as the weighted clustering coefficient cf: 
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Si{ki - 1) 



^ij^ik^jk 



(6) 



and the weighted average nearest-neighbour degree k'^ 



1 ^ 



(7) 



Both are defined in \^ . Due to their construction, these 
measures cannot be used for the analysis of fully con- 
nected weighted networks, as fc^„ ^ = 1 and c™ = 1 for 
all nodes i in such networks. Fully connected weighted 
networks form an important class of complex networks, 
for example in the form of the (virtually fully-connected) 
EU air travel network which we analyze in this letter. 
Furthermore any matrix of similarities or distances be- 
tween a number of objects - such as for instance microar- 
ray data series in biological experiments - can be treated 
as a fully connected weighted network, and thus can be 
analyzed using the ensemble approach, but not with ap- 
proaches such as eq. (|6]) and ([7]), which are "mixed" in 
the sense that they make use of both the unweighted and 
weighted adjacency matrix entries. 

Analyzing real-world weighted networks — In the follow- 
ing we demonstrate some of the advantages which the 
ensemble approach has over unweighted network mea- 
sures, as well as over mixed weighted network measures. 
We do this by applying the ensemble clustering coeffi- 
cient of eq. ([5]) to two real- world networks. The first is 
the network of passengers travelling by air within the EU 
during 2004 [T^ . The second is a network of letters in the 
English language, where the weight of the edge between 



two letters is determined by the freqency at which they 
appear next to each other in the Enghsh language ]W^ . 
Both networks include edges which lead from a node to 
itself. The network of letters has 485 edges between 27 
nodes (the alphabet and space), and therefore is 62.8% 
connected, while the EU network with 607 connections 
between the 25 member states of the EU is almost fully 
(97.1%) connected. 

In Fig. [1] we show the analysis of the EU air travel net- 
work using three different clustering coefficients: the un- 
weighted clustering coefficient Ci of eq. ^ @ , the mixed 
weighted clustering coefficient c^ of eq. (O with weighted 
and unweighted components from the literature [3| and 
the clustering coefficient cf of eq. dS]) derived from the 
ensemble approach. These quantities are plotted against 
the degree k (in the unweighted case) , the strength Si for 
the mixed approach, and the ensemble degree ki in for 
the ensemble approach. As this network is almost fully 
(97.1%) connected, the difhculty of the unweighted and 
mixed approaches becomes apparent: For the unweighted 
case, the 25 nodes of the network are mapped to just 7 
points, representing the information lost by dropping the 
weights. In the mixed case, little can be deduced about 
the relationship between the clustering coefficient c^ and 
the strength Si. The ensemble approach on the other hand 
reveals a clear negative linear relationship between the 
ensemble clustering coefficient cf of eq. (O and the en- 
semble degree ki. Note that the absolute values of the 
ensemble clustering coefficient do not mean very much, 
as they are dependent on the map Af . It is their relative 
values which carry the information, and these are largely 
independent of the choice of map M, as long as it is bi- 
jective. Countries with a large number of air passengers 
travelling in and out have a high ensemble degree ki but 
also a low ensemble clustering coefficient cf , as the many 
countries they are connected to strongly are mostly not 
well-connected themselves. Thus these nodes with low cf 
are surrounded by few triangles in any given ensemble 
realization, but many potential triangles in the form of 
pairs of edges. The inverse argument is true for nodes 
with a low ensemble degree ki, as any two neighbours of 
such a node are more likely to be strongly connected. For 



example, the two countries at the bottom right of the plot 
(high ki, low c^) are the UK and Germany, while the top 
left corner (low ki, high cf) contains Lithuania, Estonia 
and Slovakia. 

In Fig. [2] we show the analysis of the letter network 
using the same three clustering coefficients. As the let- 
ter network is less than two-thirds (62.8%) connected, 
the unweighted and mixed approaches do not encounter 
the difficulties associated with fully connected networks. 
However, if there are clusters in the network which are 
fully connected on a local scale - such that all neigh- 
bours of a given node are fully connected - these ap- 
proaches again cannot differentiate any further between 
such nodes. In both the unweighted and mixed cases the 
letters Q, Z, J and V are affected, as these letters only 
have few neighbours, which are fully connected among 
themselves, making the unweighted and mixed cluster- 
ing coefficients equal to one. In Fig. [5] these four letters 
are represented by the four data points on the plateau 
which appears in the plots for the unweighted and mixed 
measures. No information however is lost with the ensem- 
ble approach, which again shows a clear negative linear 
relationship between ensemble clustering coefficient and 
ensemble degree. As before, the implication of this is that 
nodes with many strong connections - in this case the 
vowels A, E, I, O and U, which are located at the bot- 
tom right of the plots in Fig. [2]- have neighbours which 
are weakly connected among each other. These are the 
consonants, which are mostly located in the top left cor- 
ner (low ki, high cf). 

Conclusion — We have introduced a general approach 
for the construction of measures on weighted networks, 
by introducing the concept of an ensemble network, in 
which every edge has a probability pij of existing. By 
transforming a weighted network into an ensemble net- 
work, any of the numerous measures which have been de- 
fined for unweighted networks can be straightforwardly 
generalized to weighted networks. Using the clustering 
coefficient constructed in this way as an example we 
demonstrate that these measures on weighted networks 
can reveal the additional topological information given in 
the weights, in particular for fully connected networks. 
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